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Abstract 

Image formation for radio astronomy can be defined as estimating the spatial power distribution of celestial 
sources over the sky, given an array of antennas. One of the challenges with image formation is that the problem 
becomes ill-posed as the number of pixels becomes large. The introduction of constraints that incorporate a-priori 
knowledge is crucial. In this paper we show that in addition to non-negativity, the magnitude of each pixel in 
an image is also bounded from above. Indeed, the classical “dirty image” is an upper bound, but a much tighter 
upper bound can be formed from the data using array processing techniques. This formulates image formation as a 
least squares optimization problem with inequality constraints. We propose to solve this constrained least squares 
problem using active set techniques, and the steps needed to implement it are described. It is shown that the least 
squares part of the problem can be efficiently implemented with Krylov subspace based techniques, where the 
structure of the problem allows massive parallelism and reduced storage needs. The performance of the algorithm 
is evaluated using simulations. 


Index Terms 

Radio astronomy, array signal processing, constrained optimization, Krylov subspace, LSQR, MVDR, image 
deconvolution 

Image formation for radio astronomy can be defined as estimating the spatial power distribution of 
celestial sources over the sky. The data model (“measurement equation”) is linear in the source powers, 
and the resulting least squares problem has classically been implemented in two steps: formation of a 
“dirty image”, followed by a deconvolution step. In this process, an implicit model assumption is made 
that the number of sources is discrete, and subsequently the number of sources has been replaced by the 
number of image pixels (assuming each pixel may contain a source). 

The deconvolution step becomes ill-conditioned if the number of pixels is large H]. Alternatively, the 
directions of sources may be estimated along with their powers, but this is a complex non-linear problem. 
Classically, this has been implemented as an iterative subtraction technique, wherein source directions are 
estimated from the dirty image, and their contribution is subtracted from the data. This mixed approach 
is the essence of the CLEAN method proposed by Hogbom 0, which was subsequently refined and 
extended in several ways, leading to the widely used approaches described in 0, 0. 

The conditioning of the image deconvolution step can be improved by incorporating side information 
such as non-negativity of the image [jSJ, source model structure beyond simple point sources (e.g., shapelets 
and wavelets 0), and sparsity or i\ constraints on the image 0, 0. Beyond these, some fundamental 
approaches based on parameter estimation techniques have been proposed, such as the Least Squares 
Minimum Variance Imaging (LS-MVI) 0 and maximum likelihood based techniques iflOl . Computational 
complexity is a concern and this has not been addressed in these approaches. 

New radio telescopes such as the Low Frequency Array (LOFAR), the Allen Telescope Array (ATA), 
Murchison Widefield Array (MWA) and the Fong Wavelength Array (FWA) are composed of many 
stations (each station made up of multiple antennas that are combined using adaptive beamforming), and 
the increase in number of antennas and stations continues in the design of the square kilometer array 
(SKA). These instruments have or will have a significantly increased sensitivity and a larger field of view 
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compared to traditional telescopes, leading to many more sources that need to be taken into account. They 
also need to process larger bandwidths to reach this sensitivity. Besides the increased requirements on the 
performance of imaging, the improved spatial resolution leads to an increasing number of pixels in the 
image, and the development of computationally efficient techniques is critical. 

To benefit from the vast literature related to solving least square problems, but also to gain from the non¬ 
linear processing offered by standard deconvolution techniques, we propose to reformulate the imaging 
problem as a parameter estimation problem described by a weighted least squares optimization problem 
with several constraints. The first is a non-negativity constraint, which would lead to the non-negative 
least squares algorithm (NNLS) proposed in [01. But we show that the pixel values are also bounded 
from above. A coarse upper bound is provided by the classical dirty image, and a much tighter bound is 
the “minimum variance distortionless response” (MVDR) dirty image that was proposed in the context 
of radio astronomy in IITOl 

We propose to solve the resulting constrained least squares problems using an active set approach. 
This results in a computationally efficient imaging algorithm that is closely related to existing non-linear 
sequential source estimation techniques such as CLEAN with the benefit of accelerated convergence due 
to tighter upper bounds on the power distribution over the complete image. Because the constraints are 
enforced over the entire image, this eliminates the inclusion of negative flux sources and other anomalies 
that appear in some existing sequential techniques. 

To further reduce the computational complexity we show that the data model has a Khatri-Rao structure. 
This can be exploited to significantly improve the data management and parallelism compared to general 
implementations of least squares algorithms. 

The structure of the paper is as follows. In Sec. U we describe the basic data model, and in Sec. HD 
the image formation problem. A constrained least squares problem is formulated, using various power 
constraints that take the form of dirty images. The solution of this problem using active set techniques in 
Sec. Hm generalizes the classical CLEAN algorithm. In Sec. [IV] we discuss the efficient implementation of 
a key step in the active set solution using Krylov subspaces. We end up with some simulated experiments 
demonstrating the advantages of the proposed technique and conclusions regarding future implementation. 


Notation 

A boldface letter such as a denotes a column vector, a boldface capital letter such as A denotes a 
matrix. We will frequently use indexed matrices A k and let a ; / , be the Ah column of A k , whereas a t ± 
is the Ah element of the vector a fc . I is an identity matrix of appropriate size and I p is a p x p identity 
matrix. 

(■) T is the transpose operator, (■)* is the complex conjugate operator, {■) u is the Hermitian transpose, 
|| • |If is the Frobenius norm of a matrix, ||.|| is the two norm of a vector and £{■} is the expectation 
operator. 

A calligraphic capital letter such as X represents a set of indices, 

and a x is a column vector constructed by stacking the elements of a that belong to X. The corresponding 
indices are stored with the vector as well (similar to the storage of matlab “sparse” vectors). 

vect(-) stacks the columns of the argument matrix to form a vector, vectdiag(-) stacks the diagonal 
elements of the argument matrix to form a vector, diag(-) is a diagonal matrix with its diagonal entries 
from the argument vector (if the argument is a matrix diag(-) = diag(vectdiag(-))). 

Let 0 denote the Kronecker product, o the Khatri-Rao product (column-wise Kronecker product), and 
0 the Hadamard (element-wise) product. The following properties are used throughout the paper (for 
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matrices and vectors with compatible dimensions): 

(B r ® A)vect(X) 
(B ® A) h 
(B® A)" 1 
(B r o A)x 

(BC ® AD) 
(BCo AD) 

(B^COA^D) 

vectdiag(A^XA) 


vect(AXB) 

(B h ® A") 
(B" 1 ® A’ 1 ) 
vect(Adiag(x)B) 
(B® A)(C® D) 
(B® A)(CoD) 
(BoAf(CoD) 
(A* o A) // vect(X) 


I. Data Model 

We consider an instrument where P receivers (stations or antennas) are observing the sky. Assuming a 
discrete point source model, we let Q denote the number of visible sources. The received signals at the 
antennas are sampled and subsequently split into narrow sub-bands. For simplicity, we will consider only 
a single sub-band in the rest of the paper. Although the sources are considered stationary, because of the 
earth’s rotation the apparent position of the celestial sources will change with time. For this reason the 
data is split into short blocks or “snapshots” of N samples, where the exact value of N depends on the 
resolution of the instrument. 

We stack the output of the P antennas at a single sub-band into a vector y k[n], where n — 1, • ■ ■ , N 
denotes the sample index, and k — 1, ■ ■ • , K denotes the snapshot index. 

The signals of the gth source arrive at the array with slight delays for each antenna which depend on 
the source direction and the earth rotation (the geometric delays), and for sufficiently narrow sub-bands 
these delays become phase shifts. Let a Qi k denote this array response vector towards the gth source at the 
k\h snapshot. We assume that it is normalized such that a^ k a q j ; = 1. In this notation, we use a tilde to 
denote parameters and related matrices that depend on the ‘true’ direction of the sources. However, in 
most of the paper we will work with parameters that are discretized on a grid, in which case we will drop 
the tilde. The grid points correspond to the image pixels and do not necessary coincide with the actual 
positions of the sources. 

Assuming an array that is otherwise calibrated, the received antenna signals y/,-[n] can be modeled as 

y k [n\ = A fc x[n] + n k [n], n = 1, • • • , N (1) 

where A/, is a P x Q matrix whose columns are the array response vectors a q ^, x[n] is a Q x 1 vector 
representing the signals from the sky, and rifc[n] is a P x 1 vector modeling the noise. 

From the data, the system estimates covariance matrices (also known as visibilities) of the input vector 
at each snapshot k — 1, ■ • • , K, as 

1 N 

Rfe = jr^2yk[ri\y k [ri\ H , k = 1, •••,/!. (2) 

n —1 

Since the received signals and noise are Gaussian, these covariance matrix estimates form sufficient 
statistics for the imaging problem ITOl . The covariance matrices are given by 

R k = £{ykyk} ( 3 ) 


for which the model is 


Rfc — AfcSA^ + R ni , 


(4) 
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where £ = and R n / = £{n fc n^} are the source and noise covariance matrices, respectively. 

We have assumed that sky sources are stationary, and if we also assume that they are independent, we 
can model £ = diag(er) where 

a = [ai a Q \ (5) 

represents the power of the sources. Vectorizing both sides of © we obtain 

r k = (A* k o A k )a + r nifc (6) 


where r k = vect(Rfc) and r n k = vcct(R n / i: ). After stacking the vectorized covariances for all of the 
snapshots we obtain 

r = \!><x + r n (7) 


where 



'rr 


A* o Ai 



r = 

Jk. 

, * = 

i 

>< 
... o 

, r n = 

An ,K. 


( 8 ) 


Similarly we vectorize and stack the sample covariance matrices as 


r k = vect(R fc ), r = 


ri 


Vk. 


(9) 


Instead of ([7]), we can use the independence between the time samples and also write the aggregate 
data model as 


R 


Ri 

0 


0 

0 

Rk. 


Q 


oA’J^oAf + Rn, 
q =1 


where 

and 


A q — ) Q — 1) ■ ■ ■ j Q 


Rn = 


Rn.i • • • 0 

0 

0 ... R n ,A'. 


( 10 ) 


( 11 ) 


( 12 ) 


II. The Imaging Problem 

Using the data model ©, the imaging problem is to find the spatial power distribution <r of the 
sources, along with their directions represented by the matrices A k , from given sample covariance matrices 
Rfc, k — 1, • ■ • , K. As the source locations are generally unknown, this is a complicated (non-linear) 
direction-of-arrival estimation problem. 

The usual approach in radio astronomy is to define a grid for the image, and to assume that each pixel 
(grid location) contains a source. In this case the source locations are known, and estimating the source 
powers is a linear problem, but for high-resolution images the number of sources may be very large. The 
resulting linear estimation problem is often ill-conditioned unless additional constraints are posed. 
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A. Gridded Imaging Model 

After defining a grid for the image and assuming that a source exists for each pixel location, let / 
denote the total number of sources (pixels), cr an / x 1 vector containing the source powers, and A/,. 
(k — 1, • • • , K) the P x / array response matrices for these sources. Note that the A fc are known, and that 
cr can be interpreted as a vectorized version of the image to be computed; we dropped the ‘tilde’ in the 
notation to indicate the difference between the gridded pixel locations and the true (and unknown) source 
locations. The Ah column of A k is a iik , and similar to we assume that = 1 for i = 1, • • • , I 

and k — 1, • • ■ , K. 

The corresponding data model is now 

Rfc = A fc diag(<r)Af + R n , fc , (13) 

or in vectorized and stacked form (replacing ©) 


r = H/rr + r n , (14) 

or in blockdiagonal form (replacing (fTOl) ) 

i 

R = <Ti{l K o A 4 )(I a ^ O A 1 ) h + R n , (15) 

2—1 

where 

A-k [^-1,/c) •) ^-/,/c] i & 1? * * ’ ? K (16) 

* = [(A* 1 oA 1 ) T ,---,(A),oA a -) T ] T (IV) 

A'=[a M ... a^], i = !,■■■,/. (18) 


For a given observation r in ([9]). image formation amounts to the estimation of cr. For a sufficiently 
fine grid, cr approximates the solution of the discrete source model. However, as we will discuss later, 
working in the image domain leads to a gridding related noise floor. This is solved by fine adaptation of 
the location of the sources and estimating the true locations in the visibility domain. 


B. Unconstrained Least Squares Images 

If we ignore the term r n , then ( 031 ) directly leads to Least Squares (LS) and Weighted Least Squares 
(WLS) estimates of cr [HI. In particular, solving the imaging problem with LS leads to the minimization 
problem 

1 12 (19) 


min -I r — \Fcr 

<r 2K 


It is straightforward to show that the solution to this problem is given by any cr that satisfies 

Hlsct = o-mf 


( 20 ) 


where we define the “matched filter” (MF, also known as the classical “direct Fourier transform dirty 
image”) as 

<5mf = —= — ^2 vectdiag(Af R fc A fc ), (21) 


and the deconvolution matrix FL « as 


Hls = T*"* = i y](AjA* t ) © (A»A k ). 


(22) 
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Similarly we can define the WLS minimization as 

min ^||(R- T/2 ® R- 1/2 )(f - *<t) || 2 , (23) 

<r ZK 

where the weighting assumes Gaussian distributed observations. The weighting improves the statistical 
properties of the estimates, and R is used instead of R because it is available and gives asymptotically 
the same optimal results, i.e., convergence to maximum likelihood estimates [fTTI . The solution to this 
optimization is similar to the solution to the LS problem and is given by any er that satisfies 


Hwrs rr — <5" wls ) 


(24) 


where 


is the “WLS dirty image” and 


^WLS = -^^(R^®R^)f 
Hwls = ^^(R' T ®R' 1 )^. 

K 


(25) 

(26) 


is the associated deconvolution operator. 

A connection to beamforming is obtained as follows. The ith pixel of the “Matched Filter” dirty image 
in equation (12TI) can be written as 


OmF ,i — 


K 


^ 1 a i,k^k a i,k 


and if we replace a ijk /\/K by a more general “beamformer” w ik , this can be generalized to a more 
general dirty image 

Ow ,i = ^2 W i!k^k W i.,k 
k 

Here, w ik is called a beamformer because we can consider that it acts on the antenna vectors y k [n\ as z t j. = 
w^.y/, [n ], where z i>k is the output of the (direction-dependent) beamformer, and = J2k^{\ z i,k\ 2 } 1S 
interpreted as the total output power of the beamformer, summed over all snapshots. We will encounter 
several such beamformers in the rest of the paper. 


C. Preconditioned Weighted Least Squares 

If has full column rank then H LS and H WLS are non-singular and there exists a unique solution to 
LS and WLS. For example the solution to ([20b becomes 

er = Hls<t mf . (27) 

Unfortunately, if the number of pixels is large then H L s and H W ls become ill-conditioned or even 
singular, so that (l20l) and (l24l) have an infinite number of solutions [[Q. Generally, we need to improve 
the conditioning of the deconvolution matrices and to find appropriate regularizations. 

One way to improve the conditioning of a matrix is by applying a preconditioner. The most widely 
used and simplest preconditioner is the Jacobi preconditioner [fl2l which, for any matrix M, is given by 
|diag(M)]" Let Dwl S = diag(HwLs), then by applying this preconditioner to H WLS we obtain 

[D W l S Hwls]ct = D W l S <t wls . (28) 

We take a closer look at D w J s <t WL s for the case where K = 1. In this case 

Hwls = (At o Ax)^(Rr T (8) Rr'XAt o A,) 

= (A T Rr T At) 0 (Af Rf 1 A 1 ) 
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and 


D -1 — 

^WLS — 


1 

( a M®l la w) 2 


( a Fi R i a i, i) 2 _ 


This means that 


■^wls^wls — D wls (R 1 <g) R, )(Aj o Ai) fi 


= (Rr' J 'A;D^ 2 o ri'a.d; 

which is equivalent to a dirty image that is obtained by applying a beamformer of the form 

1 


»-l/2 \H- 


'WLS ) r l 


W i = 


af,R, 1 a itl 


Ri 


(29) 


to both sides of Ri and stacking the results, cr, = w^R, w,, of each pixel into a vector. This beamformer 
is known in array processing as the Minimum Variance Distortionless Response (MVDR) beamformer 
Ifl3l . and the corresponding MVDR dirty image was introduced in the radio astronomy context in lUOl . 

D. Bounds on the Image 

Another approach to improve the conditioning of a problem is to introduce appropriate constraints on 
the solution. Typically, image formation algorithms exploit external information regarding the image in 
order to regularize the ill-posed problem. For example maximum entropy techniques ffl4ll . lfT51 impose a 
smoothness condition on the image while the CLEAN algorithm [O exploits a point source model wherein 
most of the image is empty, and this has recently been connected to sparse optimization techniques {8). 

A lower bound on the image is almost trivial: each pixel in the image represents the power coming 
from a certain direction, hence is non-negative. This leads to a lower bound cr > 0. Such a non-negativity 
constraint has been studied for example in 0, resulting in a non-negative LS (NNLS) problem 


min -Ilf — (E'er 

<r 2 K" 1 

subject to 0 < cr 


(30) 


A second constraint follows if we also know an upper bound 7 such that cr < 7 , which will bound the 
pixel powers from above. We will propose several choices for 7 . 

Actual dirty images are based on the sample covariance matrix R and hence they are random variables. 
By closer inspection of the ith pixel of the MF dirty image <x MF , we note that its expected value is given 

by 1 

0MF ,i = JT a U- K >^-k ■ 


Using 


a, = vect(A’) = [af 


T 

i,l 


and the normalization aj i k a l j,. = 1, we obtain 


a IkY , 


1 „ 1 „ 

^"z H rT'^-z 

A A 


(31) 


(32) 


Rr = I>7(I*oA J )(I 


au h + r„ 


-K o A 3 ) 


where (cf. (1131) ) 


(33) 
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is the contribution of all other sources and the noise. Note that R r is positive-(semi)definite. Thus, (l32l) 
implies <7 M F,i > which means that the expected value of the MF dirty image forms an upper bound for 
the desired image, or 

a < ct M f • (34) 

Now let us define a beamformer w M f,; = f^a,, then we observe that each pixel in the MF dirty image 
is the output of this beamformer: 

CHviF,* = W^jRwMF,;. (35) 

As indicated in Sec. III-Bl we can extend this concept to a more general beamformer w, : . The output power 

of this beamformer, in the direction of the ith pixel, becomes 

Ow,i = wf Rwj = (Tjwf (l K O A 1 ) (I A - o Af ^w, + wfR r w ?; . (36) 

If we require that 

wf (Ik o A*) (Ik o A ? )"w ?: = 1 (37) 

we have 

C>w ,i = CTi + wf R r Wj . (38) 

As before, the fact that R, is positive definite implies that 

<?i < o’w.i ■ (39) 

We can easily verify that w M f,« satisfies 071) and hence cr M F,i is a specific upper bound. A question 
which arises at this point is: What is the tightest upper bound for cq that we can construct using linear 
beamforming? We can translate this to the following optimization question: 

Oopt ,i = min wf Rwj (40) 

W i 

s.t. wf (Ik o A') (Ik ° A^w; = 1 

where a opL) ; would be this tightest upper bound. This problem can be solved (Appendix. Eb: the tightest 
upper bound is given by 

Oonu = min 
k 

and the beamformer that achieves this was called the adaptive selective sidelobe canceller (ASSC) in IjTTIl . 
One problem with using this result in practice is that cr op t,i depends on a single snapshot. This means 
that there is a variance-bias trade-off when we have a sample covariance matrix R instead of the true 
covariance matrix R. An analysis of this problem and various solutions for it are discussed in ifTTIl . 

To reduce the variance we will tolerate an increase of the bound with respect to the tightest, however 
we would like our result to be tighter than the MF dirty image. For this reason we suggest to find a 
beamformer that instead of (1371) satisfies the slightly different normalization constraint 

wfa i = VK. (42) 

We will show that the expected value of the resulting dirty image constitutes a larger upper bound than 
the ASSC m . but because the output power of this beamformer depends on more than one snapshot it 
will have a lower variance than ASSC, so that it is more robust in practice. 

With this constraint, the beamforming problem is 

Wj = arg min wf Rw, 

W i 

s.t. wfa i = \[K 



(43) 





which is recognized as the classical minimum variance distortionless response (MVDR) beamforming 
problem [fl3l . Thus, the solution is given in closed form as 


w MVDR,i 

and the resulting MVDR dirty image is 


y/K 


i^R 1 . 


-R 


-i. 


°MVDR,i — w MVDR,i Rw MVDR,i 
K a ^k R k la ^k 

(Efc a a: R fc ’ a ^) 2 

1 

i Efc a ffc R fc la *,fc 


(44) 


(45) 


Interestingly, for K = 1 this is the same image as we obtained earlier by applying a Jacobi preconditioner 
to the WLS problem. To demonstrate that this image is still an upper bound we show that 


a : = wf {1 K o A 1 )(lx o A*) fi w.j > 1. 


(46) 


Indeed, inserting (l44l) into this inequality gives 

j^afR-^IxoA^lKoA^R-iai 

(afR-'a,;) 2 

Efc( a f fc R f la i,fc) 2 

— K _hlh_ > k _1_ = 1 

h T l K l£h - ^ A max (lifl^) ’ 


(47) 


where h = (l K o A') H H 'a, is a K x 1 vector with entries ///, = a^.R^, 1 a,j, and A max (-) is the largest 
eigenvalue of of the argument matrix. Hence, a similar reasoning as in (l36l) gives 


CmVDR,* — ttCTj + W MVDRji R r W MV DR,i > a i 


which is 


O' < (T MVDR • 


(48) 


Note that w MF ,i also satisfies the constraint in (1431) . i.e. w(( |: ): a, = \f~K, but does not necessary minimize 
the output power wfRwj, therefore the MVDR dirty image is smaller than the MF dirty image: (t MV dr < 
<t mf . Thus it is a tighter upper bound. This relation also holds if R is replaced by the sample covariance 
R. 


Estimation of the Upper Bound from Noisy Data 

The upper bounds (l34l) and (l48l) assume that we know the true covariance matrix R. However in practice 
we only measure R which is subject to statistical fluctuations. Choosing a confidence level of 6 times 
the standard deviation of the dirty images ensures that the upper bound will hold with probability 99.9%. 
This leads to an increase of the upper bound by a factor 1 + a where a > 0 is chosen such that 


cr < (1 + a) 6 mf• 

Similarly, for the MVDR dirty image the constraint based on R is 

cr < (1 + a) <t M vdr 


where 


OMVDR,* 


c 


K £/.' a, - fcR /.' a 


■i.k 


(49) 

(50) 

(51) 










10 


is an unbiased estimate of the MVDR dirty image, and 


C = 


N 


(52) 


N — p 

is a bias correction constant. With some algebra the unbiased estimate can be written in vector form as 

< 5 "mvdr = D 1 SE r ' ff (R T 0 R 1 )r, (53) 

where 


D 


1 


KC 


diag 2 A H K 1 A ) , 


(54) 


and 


A = [Af ... A t k ] T 
= [ai ... aj] . 

The exact choice of a and C are discussed in Appendix [Al 


(55) 


E. Constrained Least Squares Imaging 

Now that we have lower and upper bounds on the image, we can use these as constraints in the LS 
imaging problem to provide a regularization. The resulting constrained LS (CLS) imaging problem is 


min-Ilf — tPerll 2 

<r 2/1 11 11 

s.t. 0 < <r < 7 


(56) 


where 7 can be chosen either as 7 = <r MF f° r the MF dirty image or 7 = <t MV dr for the MVDR dirty 
image. 

The improvements to the unconstrained LS problem that where discussed in Sec. lII-Bl are still applicable. 
The extension to WLS leads to the cost function 


/wls(o-) = \\m - T/2 ® RL 1/2 ) (r - *<T) II 2 • (57) 

The constrained WLS problem is then given by 

min fwLs(rr) 

J (58) 

S.t. 0 < (7 < 7 . 

We also recommend to include a preconditioner which, as was shown in Sec lII-C[ relates the WLS to 
the MVDR dirty image. However, because of the inequality constraits, (|58T) does not have a closed form 
solution and it is solved by an iterative algorithm. In order to have the relation between WLS and MVDR 
dirty image during the iterations we introduce a change of variable of the form & = Der, where & is the 
new variable for the preconditioned problem and the diagonal matrix D is given in (l54l) . The resulting 
constrained preconditioned WLS (PWLS) optimization problem is 

<r = arg min —1| (R~ T/2 ® RT 1/2 ) (f - \PD - 1 <t) || 2 
s.t. 0 < & < D 7 


and the final image is found by setting a = D” 1 ^. (Here we used that D is a positive diagonal matrix so 
that the transformation to an upper bound for & is correct.) Interestingly, the dirty image that follows from 
the (unconstrained) Weighted Least Squares part of the problem is given by the MVDR image <t M vdr in 

m. 
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III. Constrained optimization using an active set method 

The constrained imaging formulated in the previous section requires the numerical solution of the 
optimization problems (l56l) or (|59|) . The problem is classified as a positive definite quadratic program with 
simple bounds, this is a special case of a convex optimization problem with linear inequality constraints, 
and we can follow standard approaches to find a solution Ifl 8 l . Ifl9ll . 

For an unconstrained optimization problem, the gradient of the cost function calculated at the solution 
must vanish. If in an iterative process we are not yet at the optimum, the gradient is used to update the 
current solution. For constrained optimization, the constraints are usually added to the cost function using 
(unknown) Lagrange multipliers that need to be estimated along with the solution. At the solution, part of 
the gradient of the cost function is not zero but related to the nonzero Lagrange multipliers. For inequality 
constraints, the sign of the Lagrange multipliers plays an important role. 

In this Section, we use an approach called the active set method to solve the constrained optimization 
problem. 

A. Characterization of the Optimum 

Let o’ be the solution to the optimization problem (1561) or (l59l) . An image is called feasible if it satisfies 
the bounds cr > 0 and — cr > — 7 . At the optimum, some pixels may satisfy a bound with equality, and 
these are called the “active” pixels. 

We will use the following notation. For any feasible image cr, let 


£(cr) 

= {i Oi = 0 } 

(60) 

U(o) 

= {i | © = li} 

(61) 

A{o) 

= £(cr) U U(cr) 

(62) 

X{o) 

= X \ A(cr) . 

(63) 


X — {1, is the set of all pixel indices, C{er) is the set where the lower bound is active, i.e., the 

pixel value is 0. U(o) is the set of pixels which attain the upper bound. A(cr) is the set of all pixels 
where one of the constraints is active, these are the active pixels. Finally, the free set T{&) is the set of 
pixels i which have values strictly between 0 and 7 Further, for any vector v = fry], let vjr correspond 
to the sub vector with indices ? G T, and similarly define V£ and v w . We will write v = v?- © V£ © vy. 

Let 0 be the optimum, and let g = g(rr) be the gradient of the cost function at this point. Define the 
free sets and active sets X. C. U at cr. We can write g = gj- © gc © gy. Associated with the active pixels 
of 0 is a vector A — of Lagrange multipliers. Optimization theory [jT 8 l tells us that the optimum 

cr is characterized by the following conditions: 

gj-(d-) = 0 (64) 

Az: = gc > 0 (65) 

A u = —gw > 0 . (66) 

Thus, the part of the gradient corresponding to the free set is zero, but the part of the gradient corresponding 
to the active pixels is not necessarily zero. Since we have simple bounds, this part becomes equal to the 

Lagrange multipliers \ c and —A u (the negative sign is caused by the condition —cr u > —j u ). The 

condition A > 0 is crucial: a negative Lagrange multiplier would indicate that there exists a feasible 
direction of descent p for which a small step into that direction, cr + pp, has a lower cost and still 
satisfies the constraints, thus contradicting optimality of cr [|T 8 l . 

“Active set” algorithms consider that if the true active set at the solution would be known, the opti¬ 
mization problem with inequality constraints reduces to an optimization with equality constraints, 

z = argrnin /(cr) 

cr 

s.t. cr c = 0 , o u = 7 W . 


(67) 
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Since we can substitute the values of the active pixels into cr, the problem becomes a standard uncon¬ 
strained LS problem with a reduced dimension: only needs to be estimated. Specifically, for CLS the 
unconstrained subproblem is formulated as 



/( CT ) = 7^ll b LS - ^CT.f|| 2 

(68) 

where 




b LS = r — 'S’uvu- 

(69) 

Similarly for PWLS we have 



f(&) = \ 

bpWLS - (r~ T/2 ® R^ 1/2 ) (^D -1 )^^ 

(70) 

where 



bpWLS 

= (r- t/2 ® R~ 1/2 ) (f - 

(71) 


In both cases, closed form solutions can be found, and we will discuss a suitable Krylov-based algorithm 
for this in Sec. |IV| 

Hence the essence of the constrained optimization problem is to find C. U and J~. In the literature 
algorithms for this are called active set methods, and we propose a suitable algorithm in Sec. IIII-Cl 

B. Gradients 

We first derive expressions for the gradients required for each of the unconstrained subproblems (1681) 
and (l7Ql) . Generically, a WLS cost function (as function of a real-valued parameter vector 6) has the form 

m wls = /3||G 1 / 2 c(0 )|| 2 = f3c(6) H Gc(d ) (72) 

where G is a Hermitian weighting matrix and (3 is a scalar. The gradient of this function is 

s(e) " w {w) Gc <73) 

For LS we have 6 = a, c = r — Sficr, /3 = ^ and G = I. This leads to 

gLs(cr) = _ **) 

= H ls <t — <x MF . (74) 

For PWLS, 6 = <x, c = r — \PD [3 = \ and G = Rr T ® R _1 . Substituting into d73l) we obtain 

grwLs(d-) = -T>- 1 9 h (BT t ® R -1 )(? - ^D’ 1 ^) 

= H PWLS <t — o'mvdr (75) 

where 

Hpwls = D-^^(R' t ® R” 1 )\E r D _1 , (76) 

and we used (l53l) . 

An interesting observation is that the gradients can be interpreted as residual images obtained by 
subtracting the dirty image from a convolved model image. This will at a later point allow us to relate 
the active set method to sequential source removing techniques. 
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TABLE I: Constrained LS Imaging Using Active Sets 

1: Initialize: set the initial image er (0 ^ = 0, j — 0, set the free set T = 0, and C,U accordingly 
2: Set the flag Freegradient-isnotzero := True 
3: while Freegradient-isnotzero or A m in < 0 do 
4: if Freegradient-isnotzero then 

5: Let z be the solution of the unconstrained subproblem ( 167b 

6: if z is feasible then 

7: Update the image: er^ +1 ^ = z 

8: Set Freegradient-isnotzero := False 

9: Compute the “active” part of the gradient and estimate the Lagrange multipliers 

10: Let A m in be the smallest Lagrange multiplier and i m i n the corresponding pixel index 

11: else 

12: Compute the direction of descent p = z — a 

13: Compute the maximum feasible nonnegative step-size /i max and let i be the corresponding pixel index that will attain a bound 

(7+1) (i) 

14: Update the image: cr y = cry + /r m ax p 

15: Add a constraint: move i from the free set JF to £ or U 

16: Set Freegradient-isnotzero := True 

17: end if 

18: Increase the image index: j := j + 1 

19: else 

20: Delete a constraint: move i m j n from £ or U to the free set T 

21: Set Freegradient-isnotzero := True 

22: end if 

23: end while 


C. Active Set Methods 

In this section, we describe the steps needed to find the sets C, U and J~, and the solution. We follow the 
template algorithm proposed in IfTSTl . The algorithm is an iterative technique where we gradually improve 
on an image. Let the image at iteration j be denoted by crwhere j = 1,2,---, and we always ensure 
this is a feasible solution (satisfies 0 < cr d) < 7). The corresponding gradient is the vector g = g(cr^), 
and the current estimate of the Lagrange multipliers A is obtained from g using (1651) . (1661 ) . The sets C, 
U and T are current estimates that are not yet necessarily equal to the true sets. 

If this image is not yet the true solution, it means that one of the conditions in (I64l)-(l66l ) is violated. If 
the gradient corresponding to the free set is not yet zero (gj- ^ 0 ), then this is remedied by recomputing 
the image from the essentially unconstrained subproblem (l67l) . It may also happen that some entries of 
A are negative. This implies that we do not yet have the correct sets C, U and T. Suppose \ < 0. The 
connection of \ to the gradient indicates that the cost function can be reduced in that dimension without 
violating any constraints lfl 8 l . at the same time making that pixel not active anymore. Thus we remove 
the i\h pixel from the active set, add it to the free set, and recompute the image with the new equality 
constraints using (l67l) . As discussed later, a threshold e is needed in the test for negativity of \ and 
therefore this step is called the “detection problem”. 

Table Q] summarizes the resulting active set algorithm and describes how the solution z to the subproblem 
is used at each iteration. Some efficiency is obtained by not computing the complete gradient g at every 
iteration, but only the parts corresponding to CM, when they are needed. Lor the part corresponding T, 
we use a flag that indicates whether gj- is zero or not. 

In line 1, the iterative process is initialized. This can be done in many ways. As long as the initial image 
lies within the feasible region (0 < er(°) < 7 ), the algorithm will converge to a constrained solution. We 
can simply initiali z e by rr '" 1 = 0 . 

Line 3 is a test for convergence, corresponding to the conditions (l64l) - (l66l) . The loop is followed while 
a condition is violated. 

If gj- is not zero, then in line 5 the unconstrained subproblem (1671 ) is solved. If this solution z satisfies 
the feasibility constraints, then it is kept, the image is updated accordingly, and the gradient is estimated 
at the new solution (only A min = min (A) is needed, along with the corresponding pixel index). 
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If z is not feasible, then in line 12-16 we try to move into the direction of z as far as possible. The 
direction of descent is p = z — cry, and the update will be a^ = cry + /rp, where h is a non-negative 
step size. The ith pixel will hit a bound if either a- 1 ' 1 + /ip,; = 0 or ap + ///>, = 7 j, i.e., if 


Hi = max 



(77) 


(note that Hi ' s non-negative). Then the maximal feasible step size towards a constraint is given by 
/Umax = min(pj), for i G 77 The corresponding pixel index is removed from T and added to C or LA. 

If in line 3 the gradient satisfied gj- = 0 but a Lagrange multiplier was negative, we delete the 
corresponding constraint and add this pixel index to the free set (line 20). After this, the loop is entered 
again with the new constraint sets. 

Suppose we initialize the algorithm with cr iM) = 0, then all pixel indices will be in the set C, and 
the free set is empty. During the first iteration remains empty but the gradient is computed (line 9). 
Equations (1741) and (1751) show that it will be equal to the negated dirty image. Thus the minimum of the 
Lagrange multipliers A min will be the current strongest source in the dirty image and it will be added 
to the free set when the loop is entered again. This shows that the method as described above will lead 
to a sequential source removal technique similar to CLEAN. In particular, the PWLS cost function (1751) 
relates to LS-MVI [J9l, which applies CLEAN-like steps to the MVDR dirty image. 

In line 3, we try to detect if a pixel should be added to the free set (A min < 0). Note that A follows 
from the gradient, (1741) or (1751) . which is a random variable. We should avoid the occurence of a “false 
alarm”, because it will lead to overfitting the noise. Therefore, the test should be replaced by A min < —e, 
where e > 0 is a suitable detection threshold. Because the gradients are estimated using dirty images, they 
share the same statistics (the variance of the other component in (1741) and (1751) is much smaller). To reach 
a desired false alarm rate, we propose to choose e proportional to the standard deviation of the i\h pixel 
on the corresponding dirty image for the given cost function. (How to estimate the standard deviation of 
the dirty images and the threshold is discussed in Appendix [A]). Choosing e to be 6 times the standard 
deviation ensures a false alarm of < 0 . 1 % over the complete image. 

The use of this statistic improves the detection and hence the estimates greatly, however the correct 
detection also depends on the quality of the power estimates in the previous iterations. If a strong source 
is off-grid, the source is usually underestimated, this leads to a biased estimation of the gradient and 
the Lagrange multipliers, which in turn leads to inclusion of pixels that are not real sources. In the next 
section we describe one possible solution for this case. 


D. Strong Off-Grid Sources 

The mismatch between and the unknown results in an underestimation of source powers, which 
means that the remaining power contribution of that source produces bias and possible artifacts in the 
image. In order to achieve high dynamic ranges we suggest finding a grid correction for the pixels in the 
free set T. 

So far we have not introduced a specific model for the elements in the matrix A k , but in order to be 
able to correct for these gridding mismatches we assume that the array is at least calibrated for gains such 
that we can model the columns of this steering matrix as 


&-q,h 


J_ e ^S T Q k (L,B)H q 

VP 


(78) 


where 3 is a 3 x P matrix containing the position of each receiving element, is a 3 x 3 rotation matrix 
that accounts for the earth rotations and depends on time and the observer’s longitude L and latitude B, 
f3 q is a 3 x 1 unit vector toward the direction of the source and A is the wavelength. Let a l k have the 
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same model as A q k with f3 i pointing towards the center of the zth pixel. When a source is within a pixel 
but not exactly in the center we can model this mismatch as 


&-q,k 


1 ^B t Q k (0i+8j) 

Vp 


—,t 

= a i)fc 0e»“ 


Q k$i 


where <5,; = /3 q — (3, and i e T. Because both (3, and (3 are 3 x 1 unit vectors, each has only two degrees 
of freedom. This means that we can parameterize the unknowns for the grid correcting problem using 
coefficients S it i and Si, 2 . We will assume that when a source is added to the free set, its actual position is 
very close to the center of the pixel on which it was detected. This means that 5 it i and d t 2 are within the 
pixel’s width, denoted by W, and height, denoted by H. In this case we can replace (1671) by a non-linear 
constrained optimization, 

o.cr Z 

s.t. - W/2 < Si, 1 < W/ 2 

- H/2 < 5 ^2 < H/2 (79) 

where contains only the columns corresponding to the set J~. Sj is a vector obtained by stacking 

8{,j for j = 1,2 and 

b = r — ^ w cr w . ( 80 ) 

This problem can also be seen as a direction of arrival (DOA) estimation which is an active research area 
and out of the scope of this paper. A good review of DOA mismatch correction for MVDR beamformers 
can be found in lf 2 Tl . 

Besides solving d79l) instead of (l67l) in line 5 of the active set method we will also need to update the 
upper bounds and the standard deviations of the dirty images at the new pixel positions that are used in 
the other steps (e.g., line 3, 6 and 13), the rest of the steps remain the same. Because we have a good 
initial guess to where each source in the free set is, we propose a Newton based algorithm to do the 
correction. 


IV. Implementation using Krylov Sub space based methods 

From the active set methods described in the previous section, we know that we need to solve (1681) or 
(1701) at each iteration. In this section we describe how to achieve this efficiently and without the need of 
storing the whole convolution matrix in memory. 

A. Lanczos algorithm and LSQR 

When we are solving CLS or PWLS, we need to solve a problem of the form ||b — MxH! as the first 
step of the active set iterations. For example, in (l68l) M = \Pj-. Note that it does not have to be a square 
matrix and usually it is ill-conditioned especially if the number of pixels is large. In general we can find 
a solution for this problem by first computing the singular value decomposition (SVD) of M as 

M = USV^, (81) 

where U and V are unitary matrices and S is a diagonal matrix with positive singular values. Then the 
solution x to min ||b — Mx|| 2 is found by solving for y in 

Sy = U H b 


(82) 
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followed by setting 


x = Vy. 


(83) 


Solving the LS problem with this method is expensive in both number of operations and memory usage, 
especially when the matrices U and V are not needed after finding the solution. As we will see shortly, 
looking at another matrix decomposition helps us to reduce these costs. For the rest of this section we 
use the notation given by Il22l . 

The first step in our approach for solving LS problem is to reduce M to a lower bidiagonal form as 
follows 

M = UBV h , (84) 


where B is a bidiagonal matrix of the form 



OL\ 

” 


$2 Oi2 


B = 

/3 r a r 




0 J 


(85) 


with r = rank(M) = rank(B) and U,V are unitary matrices (different than in (l8TT) f. This representation 
is not unique and without loss of generality we could choose U to satisfy 


U^b = ftei 


( 86 ) 


where fix = ||b|| 2 and ex is a unit norm vector with its first element equal to one. 

Using B, forward substitution gives the LS solution efficiently by solving y in 

By = U H b = (87) 


followed by 


x = Vy. 


Using forward substitution we have 



Xl = V!j/i, 


( 88 ) 

(89) 


followed by the recursion, 


Un +1 


fin +1 

Un 

®n +1 


Xji+1 X n T ^n+lVn+X 


(90) 

(91) 


for n = 1,, M where M < r is the iteration at which ||M U (Mx„ — b)|| 2 vanishes within the desired 
precision. We can combine the bidiagonalization and solving for x and avoid extra storage needed for 
saving B, U and V. One such algorithm is based on a Krylov subspace method called the Lanczos 
algorithm lf23l . We first initialize with 


Pi 

Ui 

Oil 


- || U ||2 

b 

= Ji 

= HM^Uil^ 

M^ui 


(92) 

(93) 

(94) 

(95) 


Vl = 
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The iterations are then given by 

Pn+l 11 Mv n a n U n ||2 

U «+! = “ anU ") ( g 6) 

a n+1 = ||M H u n+ i -/3 n+ iv n || 2 
v n +i = ^(M^u n+1 - /3 n+ iv n ) 

for n — 1,2,, M, where u^u n = v^v n = 1. This provides us with all the parameters needed to solve 
the problem. 

However because of the finite precision errors, the columns of U and V found in this way loose their 
orthogonality as we proceed. In order to prevent this error propagation into the final solution x, different 
algorithms like Conjugate Gradient (CG), MINRES, LSQR, etc. have been proposed. The exact updates 
for x„ and stopping criteria to find M depends on the choice of algorithm used and therefor is not included 
in the iterations above. 

An overview of Krylov subspace based methods, is given by [24!, pp.91]. This study shows that LSQR 
is a good candidate to solve LS problems when we are dealing with an ill-conditioned and non-square 
matrix. For this reason we will use LSQR to solve (l68l) or d70l) . Because the remaining steps during the 
LSQR updates are a few scalar operations and do not have large impact on the computational complexity 
of the algorithm we will not go into the details.(see 112211 

In the next section we discuss how to use the structure in M to avoid storing the entire matrix in 
memory and how to parallelize the computations. 

B. Implementation 

During the active set iteration we need to solve (1681) and (17(1 where the matrix M in LSQR is replaced 
by TV and (Rr T / 2 <g> R- 1/2 )(THT l )? respectively. Because T* has a Khatri-Rao structure and selecting 
and scaling a subset of columns does not change this, TV and (T'D also have a Khatri-Rao structure. 
Here we will show how to use this structure to implement (l96l) in parallel and with less memory usage. 

Note that the only time the matrix M enters the algorithm is via the matrix-vector multiplications Mv n 
and M /7 ii n+ |. As an example we will use M = F for solving (1681) . Let k n = \Evv n . We partition k n 
as 4/ into 

k n=[k£ n ... kL ,] T ■ (97) 

Using the definition of 4/ in (fT71) . the operation k n = Tv v n could also be performed using 

K k,n ^ ^ (98) 

i&F 

and subsequently setting 

k k, n = vect(K fc) „). (99) 

This process can be highly parallelized because of the independence between the correlation matrices of 
each time snapshot. The matrix K k , n can then be used to find the updates in (1961 ) . 

The operation M H u in ( 1961) . is implemented in a similar way. Using the beamforming approach (similar 
to Sec JII-DI) . this operation can also be done in parallel for each pixel and each snapshot. 

In both cases the calculations can be formulated as correlations and beamforming of parallel data 
paths which means that efficient hardware implementations are feasible. Also we can consider traditional 
LS or WLS solutions as a special case when all the pixels belong to the free set which means that 
those algorithms can also be implemented efficiently in hardware in the same way. Because during the 
calculations we work with a single beamformer at the time, the matrix 4/ need not to be pre-calculated 
and stored in the memory. This makes it possible to apply image formation algorithms for large images 
when there is a memory shortage. 
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The computational complexity of the algorithm is dominated by the transformation between the visibility 
domain and image domain (correlation and beamforming). The dirty image formation and correlation 
have a complexity of 0(Kp 2 I) this means that the worst case complexity of the active set algorithm is 
0(TMKp 2 I) where T is the number of active set iterations and M is the maximum number of Krylov 
iterations. A direct implementation of CLEAN for solving the imaging problem presented in Sec. QJ in 
similar way would have a complexity of 0(TKp 2 I). Hence the proposed algorithm is order M times more 
complex, essentially because it recalculates the flux for all the pixels in the free-set while CLEAN only 
estimates the flux of newly added pixel. In practice many implementations of CLEAN use FFT instead 
of DFT (matched filter) for calculating the dirty image. Extending the proposed method to use similar 
techniques is possible and will be presented in future works. 


True Image 



-0.2 -0.1 0 0.1 0.2 0.3 


Pi 

Fig. 1: True source 


V. Simulations 

In this section we will evaluate the performance of the proposed method using simulations. Because 
the active set algorithm adds a single pixel to the free set at each step, it is important to investigate the 
effect of this procedure on extended sources and noise. For this purpose we will use a high dynamic 
range simulated image with a strong point source and two weaker extended sources in the first part of 
the simulations. In the second part we will make a full sky image using sources from the 3C catalog. 
We use the following definitions for the coordinate systems. A fixed coordinate system based on the right 
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MF Dirty Image 



(a) MF Dirty Image 


LS Cleaned Image 




(b) Solution of the CLS image after convolu- (c) CLS Image cross-section 

tion with a Gaussian beam 



(d) MVDR dirty image 


(e) Preconditioned WLS Image after convolu- (f) MVDR dirty image and PWLS cross- 
tion with a Gaussian beam section 


Fig. 2: Extended Source Simulations 


ascension (a) and declination (5) of the sources 


P = 


cos(<5) cos(ct) 
cos(<5) sin (a) 
sin (5). 


The corresponding ( l,m,n ) coordinates s that take earth rotation into account are given by 


s = Q k (L,B)(3. 


A. Extended Sources 

An array of 100 dipoles (p = 100) with random distribution is used with the frequency range of SB- 
90 MHz from which we will simulate three equally spaced channels. Each channel has a bandwidth of 
195 kHz and is sampled at Nyquist-rate. These specification have been chosen the same as for LOFAR 
telescope in LBA modes ll25l . LOFAR uses 1 second snapshots and we will simulate using only two 
snapshots, this means that K = 2. The simulated source is a combination of a strong point source and 
two extended structures. The extended sources are composed from seven Gaussian shaped sources, one in 
the middle and 6 on a hexagon around it. Figure Q] shows the simulated image in dB scale. The background 
noise level that is added is at —10 dB which is also 10 dB below the the extended sources. 

Figures [2a] and [2d] show the matched filter and MVDR dirty images respectively. Figures [2b] and [2e] 
show the reconstructed images, after deconvolution and smoothing with a Gaussian clean beam, for the 
CLS and PWLS deconvolution with MF and MVDR dirty images as upper bounds respectively. A cross 
section of the images has been illustrated in Figures [2c] and [20 Remarks are: 
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TABLE II: Simulated Sources from the 3C Catalog 


Names 

l 

m 

Flux 

3C 461 

-0.30485 

0.19131 

11000 

3C 134 

0.59704 

-0.02604 

66 

3C 219 

0.63907 

0.6598 

44 

3C 83.1 

0.28778 

-0.13305 

28 

3C 75 

0.30267 

-0.684 

23 

3C 47 

-0.042882 

-0.51909 

20 

3C 399.2 

-0.97535 

0.20927 

19 

3C 6.1 

-0.070388 

0.47098 

16 

3C 105 

0.57458 

-0.60492 

15 

3C 158 

0.9017 

-0.12339 

14 

3C 231 

0.28956 

0.72005 

13 

3C 303 

-0.1511 

0.95402 

12.5 

3C 277.1 

0.12621 

0.93253 

12 

3C 320 

-0.3597 

0.93295 

11.5 

3C 280.1 

0.15171 

0.98709 

11 

3C 454.2 

-0.29281 

0.31322 

10.5 

3C 458 

-0.61955 

-0.56001 

10 

3C 223.1 

0.67364 

0.68376 

9.5 

3C 19 

-0.23832 

-0.30028 

9 

3C 437.1 

-0.83232 

-0.24924 

5 


• As expected the MVDR dirty image has a much better dynamic range and lower side-lobes; 

• Due to a better initial dirty image and upper bound the preconditioned WLS deconvolution gives a 
better cleaned image. However a trade-off is made between the resolution of the point source and 
the correct shape of the extended sources when we use the Gaussian beam to smoothen the image. 

• The cross sections show the accuracy of the magnitudes. This shows that not only the shape but also 
the magnitude of the sources are better estimated using PWLS. 

B. Full Sky with 3C Sources 

In this part we describe a simulation for making an all sky image. The array setup is the same as before 
with the same number of channels and snapshots. A background noise level of 0 dB (with respect to 1 
Jansky) is added to the sky. 

In this simulation we check which sources from the 3C catalog are visible at the simulated date and time. 
From these we have chosen 20 sources that represent the magnitude distribution on the sky and produce the 
highest dynamic range available in this catalog. Table (IQ shows the simulated sources with corresponding 
parameters. The coordinates are the (l,m) coordinates at the first snapshot. Because the sources are not 
necessarily on the grid point we have chosen to do the active set deconvolution in combination with grid 
correction on the free set as described in Sec. IIII-D1 

Figures [3a] and [3b] show the position and power estimates for the sources that are detected during 
the deconvolution process. Figure [3c] shows the full sky MF dirty image. The contoured version of the 
reconstructed image with minimum contour 3 dB above the noise level is shown in Figure [3d] and the 
final reconstructed image with the residual added to it is give in Figure 0] 

Remarks: 

• The algorithm stops after adding the correct number of sources based on the detection mechanism 
we have incorporated in the active set method; 

• Because of the grid correction no additional sources are added to compensate for incorrect power 
estimates on the grids; 

• All 20 sources are visible in the final reconstructed image and no visible artifacts are added to image. 

VI. Conclusions 

Based on a parametric model and power constraints, we have formulated image deconvolution as an 
optimization problem with inequality constraints which we have solved using an active set based method. 
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(a) Location Estimates 
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(b) Flux Estimates 

Imaging Result 
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(c) Full Sky MF Dirty Image (d) Reconstructed Image-scale 

Fig. 3: Point Source Simulations 


The relation between the proposed method and sequential source removing techniques is explained. The 
theoretical background of the active set methods can be used to gain better insight into how the sequential 
techniques work. 

The Khatri-Rao structure of the data model is used in combination with Krylov based techniques to 
solve the linear systems involved in the deconvolution process with less storage and complexity. We 
have introduced a preconditioned WLS cost function with a gradient that is related to the MVDR dirty 
image. Using simulation we have shown that the solution to the preconditioned WLS has improved spatial 
structure and improved power estimates. 

In this paper we have discussed the bidiagonalization and Krylov approach to solve the system of 
linear equations. The main reason for this is to reduce the storage needed for the deconvolution matrix. 
It is easy to verify that the active set updates can be translated into rank one updates and downdates of 
the deconvolution matrix. There are other matrix decompositions like QR decomposition that can take 
advantage of this fact. Knowing that the Khatri-Rao structure of the matrix does not change by adding 
or removing columns, it is interesting for future works to investigate whether rank one changes can be 
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Imaging Result 



Fig. 4: Reconstructed Image plus the Residual Image 


combined with the Krylov based techniques. 


Appendix A 

Upper Bounds on Image Powers 

To find the confidence intervals for the dirty images we need to find estimates for the variance of both 
matched filter and MVDR dirty images. In our problem the sample covariance matrix is obtained by 
squaring samples from a Gaussian process. This means that iVR ~ W P (R, N) where W P (R, N ) is the 
Wishart distribution function of order p with expected value equal to R and N degrees of freedom. For 
any deterministic vector 

nc h BjC ~ C H ~&C x\N). (100) 

where y 2 (A r ) is the standard x 2 distribution with N degrees of freedom. In radio astronomical applications 
N is usually very large and we can approximate this x 2 distribution with a Gaussian such that £^R£ ~ 
A/^C^RC {( n RC) 2 /N). The variance of the matched filter dirty image is given by 

Var(<7 MF) j) = f fc Ra t ,fc) 2 

k 

Using this result we can find the x% confidence interval which results in an increase of the upper bound 
such that 


cr < <t M f + Ov / M^mf) 


(101) 
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where a is chosen depending on x. Requiring at most a single false detection on the entire image translate 
into a ~ 6 . 

When we estimate the MVDR dirty image from sample covariance matrices we need to be more careful, 
mainly because the result is biased and we need to correct for that bias. For each pixel of the MVDR 
dirty image obtained from sample covariance matrices we have 

K 

Vmvdr ,i = Kg(z ) = —- TTTf—l - 

111, iX U, R l, a i,k 

( 102 ) 


where g(Z) = 1/Z and Z = Yhk af*.R fc 1 a.,,/.. Using a perturbation model Z = Z 0 + A Z and a Taylor 
approximation we find 


Let Zq = £{Z } then £{AZj 
which means that we want 


however we have, 


where we have used £{R *} 
correction factor 


and 


s(Z) « £ - 

^0 zjq 


« =j(Z„ - A Z). 


(103) 


0 and £{g(Z)j « 1/Z 0 - We would like this estimate to be unbiased 

1 


£{g(z)} 


Efc a 5 R fc a ' 


(104) 


Zo — i } a z, 

NRk 1 

N-p 


2 ^ a A N _ a a 


N 


—XA R * Ia M 


N — p 


(105) 


—R 1 If26ll . So in order to remove this bias we need to scale it by a 


N-p 


c = 


N 


N — p 


(106) 


O'MVDR.i — CKg(Z). 


(107) 


Now we need to find an estimate for the variance of the MVDR dirty image. Using (11031) we see that 
the first order approximation of Var (g(Z)) ~ VarfZj/Z,}. We find Var(Z) using the independence of each 
snapshot so we can write 

Var(Z) = ^Var(af fc R- 1 a i)fe ). (108) 

k 


In order to find Var(af^Rfc a^) we need to use some properties of the complex inverse Wishart distribu¬ 
tion. A matrix has complex inverse Wishart distribution if it’s inverse has a complex Wishart distribution 
|[26l . Let us define an invertible matrix B as 


B — [a^j. B x ] (109) 

then X = (BR _ 1 B- H ')/iV has an inverse Wishart distribution because X -1 = 7V(B _i 3 RB _1 ) has a 
Wishart distribution. In this case Xu = (af fc R - 1 a i,k)/N also has an inverse Wishart distribution with 
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less degrees of freedom. The covariance of an inverse Wishart matrix is derived in Il26l . however because 
we are dealing only with one element, this results simplifies to 

N 2 


Var(iVXii) = 




R 1 a ) - 


-i.k J 


( 110 ) 


{N-p) 2 {N-p- 1) V ,; ’ fc 

The variance of the unbiased MVDR dirty image is thus given by 

Var(<j M vDR,i) = Var(CKg(Z)) 

~ (N-p-1) (Efca^R- 1 ^) 4 ' 

Now that we have the variance we can use the same method that we used for MF dirty image to find a 
and 

v < o'mvdr + a\/Var(<T M vDR) (HI) 


Appendix B 
Optimum Beamformer 

We have already defined the problem of finding the beamformer for optimum upper bound as 

w i opt = arg min w H Rw (112) 

w 

s.t.w H (l K o A;)(Ik o A t ) H w = 1 


Following standard optimization techniques we define the Lagrangian and take derivatives with respect to 
w and the Lagrange multiplier // and we find 


W = fiR 1 (I K O A;) (Ik O Ai) H w 

(113) 

1 = w h (I k o A,;)(Ik o Ai) H w 

(114) 

Because R is full-rank and (|ll4|) we can model w as 


0 

i—i 

i 

ffS 

a. 

£ 

(115) 

Filling back into (|113|) we have 


/tR -1 (Ik o Aj)x 

= ^R-^Ik o A*) (Ik o Aj) h R _1 (Ik o A,;)x 

(116) 

and 

(I A' ° A j)x 

= fi (Ik ° A;) (Ik o A*) h R -1 (Ik o A. t )x 

(117) 

multiplying both sides by (Ik o A,)^ we get 


x = //(Ik ° Aj) H R _1 (lK o Aj)x. 

(118) 

Doing the same for (|ll4|) we have 


/i 2 x H (lK o Aj)^R _1 (lK o AO (lx o Aj) h R~ 1 (Ik o Aj)x 
= 1. 

(119) 

Now we use (|118|) and we find 


x H x = 1 

(120) 
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which makes finding x an eigenvalue problem. By taking a closer look at the matrix (ItfoA^RrHltfoAO 
we find that this matrix is diagonal 


(ItfoA^R-^ItfoAO 

afiRr 1 ^,! o 

0 af 2 R 2 


&i, 2 


0 


0 

0 

0 a^ K H K a i t K_ 


( 121 ) 


and hence x = e m is an elementary vector with all entries equal to zero except for mth entry which equals 
unity, m is the index corresponding to largest eigenvalue, A max , and from (11181) we have /./ = 1/A max . Filling 
back for w we find ^ 

Wi,opt = y. | R (©m ® a i,m) (122) 

and the output of the beamformer 


&opt 


w h Rw 

vv *,opt At " vv 


R 


i,opt 

-1„ 


\ H ^ „ 

i,m '‘m i,m 


a 


H R 1 


■i,m A S?k i,mj 

1 

T) In . 

<x i,m ±Xj m <*1,771 


mm 

k 


affcRfc ' a 


^,/c 


(123) 
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